Goto

Collaborating Authors

 filter condition


NeuSym-RAG: Hybrid Neural Symbolic Retrieval with Multiview Structuring for PDF Question Answering

Cao, Ruisheng, Zhang, Hanchong, Huang, Tiancheng, Kang, Zhangyi, Zhang, Yuxin, Sun, Liangtai, Li, Hanqi, Miao, Yuxun, Fan, Shuai, Chen, Lu, Yu, Kai

arXiv.org Artificial Intelligence

The increasing number of academic papers poses significant challenges for researchers to efficiently acquire key details. While retrieval augmented generation (RAG) shows great promise in large language model (LLM) based automated question answering, previous works often isolate neural and symbolic retrieval despite their complementary strengths. Moreover, conventional single-view chunking neglects the rich structure and layout of PDFs, e.g., sections and tables. In this work, we propose NeuSym-RAG, a hybrid neural symbolic retrieval framework which combines both paradigms in an interactive process. By leveraging multi-view chunking and schema-based parsing, NeuSym-RAG organizes semi-structured PDF content into both the relational database and vectorstore, enabling LLM agents to iteratively gather context until sufficient to generate answers. Experiments on three full PDF-based QA datasets, including a self-annotated one AIRQA-REAL, show that NeuSym-RAG stably defeats both the vector-based RAG and various structured baselines, highlighting its capacity to unify both retrieval schemes and utilize multiple views. Code and data are publicly available at https://github.com/X-LANCE/NeuSym-RAG.


Shedding Light on the Polymer's Identity: Microplastic Detection and Identification Through Nile Red Staining and Multispectral Imaging (FIMAP)

Ho, Derek, Feng, Haotian

arXiv.org Artificial Intelligence

The widespread distribution of microplastics (MPs) in the environment presents significant challenges for their detection and identification. Fluorescence imaging has emerged as a promising technique for enhancing plastic particle detectability and enabling accurate classification based on fluorescence behavior. However, conventional segmentation techniques face limitations, including poor signal-to-noise ratio, inconsistent illumination, thresholding difficulties, and false positives from natural organic matter (NOM). To address these challenges, this study introduces the Fluorescence Imaging Microplastic Analysis Platform (FIMAP), a retrofitted multispectral camera with four optical filters and five excitation wavelengths. FIMAP enables comprehensive characterization of the fluorescence behavior of ten Nile Red-stained MPs: HDPE, LDPE, PP, PS, EPS, ABS, PVC, PC, PET, and PA, while effectively excluding NOM. Using K-means clustering for robust segmentation (Intersection over Union = 0.877) and a 20-dimensional color coordinate multivariate nearest neighbor approach for MP classification (>3.14 mm), FIMAP achieves 90% precision, 90% accuracy, 100% recall, and an F1 score of 94.7%. Only PS was occasionally misclassified as EPS. For smaller MPs (35-104 microns), classification accuracy declined, likely due to reduced stain sorption, fewer detectable pixels, and camera instability. Integrating FIMAP with higher-magnification instruments, such as a microscope, may enhance MP identification. This study presents FIMAP as an automated, high-throughput framework for detecting and classifying MPs across large environmental sample volumes.


Canonical Form of Datatic Description in Control Systems

Zhan, Guojian, Zheng, Ziang, Li, Shengbo Eben

arXiv.org Artificial Intelligence

The design of feedback controllers is undergoing a paradigm shift from modelic (i.e., model-driven) control to datatic (i.e., data-driven) control. Canonical form of state space model is an important concept in modelic control systems, exemplified by Jordan form, controllable form and observable form, whose purpose is to facilitate system analysis and controller synthesis. In the realm of datatic control, there is a notable absence in the standardization of data-based system representation. This paper for the first time introduces the concept of canonical data form for the purpose of achieving more effective design of datatic controllers. In a control system, the data sample in canonical form consists of a transition component and an attribute component. The former encapsulates the plant dynamics at the sampling time independently, which is a tuple containing three elements: a state, an action and their corresponding next state. The latter describes one or some artificial characteristics of the current sample, whose calculation must be performed in an online manner. The attribute of each sample must adhere to two requirements: (1) causality, ensuring independence from any future samples; and (2) locality, allowing dependence on historical samples but constrained to a finite neighboring set. The purpose of adding attribute is to offer some kinds of benefits for controller design in terms of effectiveness and efficiency. To provide a more close-up illustration, we present two canonical data forms: temporal form and spatial form, and demonstrate their advantages in reducing instability and enhancing training efficiency in two datatic control systems.


Bounded-Memory Criteria for Streams with Application Time

Schiff, Simon, Özcep, Özgür

arXiv.org Artificial Intelligence

Bounded-memory computability continues to be in the focus of those areas of AI and databases that deal with feasible computations over streams---be it feasible arithmetical calculations on low-level streams or feasible query answering for declaratively specified queries on relational data streams or even feasible query answering for high-level queries on streams w.r.t. a set of constraints in an ontology such as in the paradigm of Ontology-Based Data Access (OBDA). In classical OBDA, a high-level query is answered by transforming it into a query on data source level. The transformation requires a rewriting step, where knowledge from an ontology is incorporated into the query, followed by an unfolding step with respect to a set of mappings. Given an OBDA setting it is very difficult to decide, whether and how a query can be answered efficiently. In particular it is difficult to decide whether a query can be answered in bounded memory, i.e., in constant space w.r.t. an infinitely growing prefix of a data stream. This work presents criteria for bounded-memory computability of select-project-join (SPJ) queries over streams with application time. Deciding whether an SPJ query can be answered in constant space is easier than for high-level queries, as neither an ontology nor a set of mappings are part of the input. Using the transformation process of classical OBDA, these criteria then can help deciding the efficiency of answering high-level queries on streams.


On the Satisfiability Problem for SPARQL Patterns

Zhang, Xiaowang, Van den Bussche, Jan, Picalausa, François

Journal of Artificial Intelligence Research

The satisfiability problem for SPARQL 1.0 patterns is undecidable in general, since the relational algebra can be emulated using such patterns. The goal of this paper is to delineate the boundary of decidability of satisfiability in terms of the constraints allowed in filter conditions. The classes of constraints considered are bound-constraints, negated bound- constraints, equalities, nonequalities, constant-equalities, and constant-nonequalities. The main result of the paper can be summarized by saying that, as soon as inconsistent filter conditions can be formed, satisfiability is undecidable. The key insight in each case is to find a way to emulate the set difference operation. Undecidability can then be obtained from a known undecidability result for the algebra of binary relations with union, composition, and set difference. When no inconsistent filter conditions can be formed, satisfiability is decidable by syntactic checks on bound variables and on the use of literals. Although the problem is shown to be NP-complete, it is experimentally shown that the checks can be implemented efficiently in practice. The paper also points out that satisfiability for the so-called ‘well-designed’ patterns can be decided by a check on bound variables and a check for inconsistent filter conditions.


On the satisfiability problem for SPARQL patterns

Zhang, Xiaowang, Bussche, Jan Van den, Picalausa, François

arXiv.org Artificial Intelligence

The satisfiability problem for SPARQL patterns is undecidable in general, since the expressive power of SPARQL 1.0 is comparable with that of the relational algebra. The goal of this paper is to delineate the boundary of decidability of satisfiability in terms of the constraints allowed in filter conditions. The classes of constraints considered are bound-constraints, negated bound-constraints, equalities, nonequalities, constant-equalities, and constant-nonequalities. The main result of the paper can be summarized by saying that, as soon as inconsistent filter conditions can be formed, satisfiability is undecidable. The key insight in each case is to find a way to emulate the set difference operation. Undecidability can then be obtained from a known undecidability result for the algebra of binary relations with union, composition, and set difference. When no inconsistent filter conditions can be formed, satisfiability is efficiently decidable by simple checks on bound variables and on the use of literals. The paper also points out that satisfiability for the so-called `well-designed' patterns can be decided by a check on bound variables and a check for inconsistent filter conditions.